145 research outputs found

    Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond

    Full text link
    Many tasks in music information retrieval (MIR) involve weakly aligned data, where exact temporal correspondences are unknown. The connectionist temporal classification (CTC) loss is a standard technique to learn feature representations based on weakly aligned training data. However, CTC is limited to discrete-valued target sequences and can be difficult to extend to multi-label problems. In this article, we show how soft dynamic time warping (SoftDTW), a differentiable variant of classical DTW, can be used as an alternative to CTC. Using multi-pitch estimation as an example scenario, we show that SoftDTW yields results on par with a state-of-the-art multi-label extension of CTC. In addition to being more elegant in terms of its algorithmic formulation, SoftDTW naturally extends to real-valued target sequences.Comment: Accepted at ICASSP 202

    Combinatorial problems arising from pooling designs for dna library screening

    Get PDF
    Colbourn (1999) developed some strategy for nonadaptive group testing when the items are linearly ordered and the positive items form a consecutive subset of all items.Müller and Jimbo (2004) improved his strategy by introducing the concept of 2-consecutive positive detectable matrices (2CPD-matrix) requiring that all columns and bitwise OR-sum of each two consecutive columns are pairwise distinct. Such a matrix is called maximal if it has a maximal possible number of columns with respect to some obvious constraints. Using a recursive construction they proved the existence of maximal 2CPD-matrices for any column size m ∈ N except for the case m = 3. Moreover, maximal 2CPD-matrices such that each column is of some fixed constant weight areconstructed. This leads to pooling designs, where each item appears in the same number of pools and all pools are of the same size.Secondly, we investigate 2CPD-matrices of some constant column weight τ ∈ N. We give some recursive construction of such matrices having the maximal possible number of columns. Thirdly, error correction capability of group testing procedures is essential in view of applications such as DNA library screening. We consider a error correcting 2CPD-matrices

    Flabase: towards the creation of a flamenco music knowledge base

    Get PDF
    Online information about flamenco music is scattered overdifferent sites and knowledge bases. Unfortunately, thereis no common repository that indexes all these data. Inthis work, information related to flamenco music is gath-ered from general knowledge bases (e.g., Wikipedia, DB-pedia), music encyclopedias (e.g., MusicBrainz), and spe-cialized flamenco websites, and is then integrated into anew knowledge base called FlaBase. As resources fromdifferent data sources do not share common identifiers, aprocess of pair-wise entity resolution has been performed.FlaBase contains information about 1,174 artists, 76pa-los(flamenco genres), 2,913 albums, 14,078 tracks, and771 Andalusian locations. It is freely available in RDF andJSON formats. In addition, a method for entity recognitionand disambiguation for FlaBase has been created. The sys-tem can recognize and disambiguate FlaBase entity refer-ences in Spanish texts with an f-measure value of 0.77. Weapplied it to biographical texts present in Flabase. By usingthe extracted information, the knowledge base is populatedwith relevant information and a semantic graph is createdconnecting the entities of FlaBase. Artists relevance is thencomputed over the graph and evaluated according to a fla-menco expert criteria. Accuracy of results shows a highdegree of quality and completeness of the knowledge base

    trackswitch.js: A Versatile Web-Based Audio Player for Presenting Scientific Results

    Get PDF
    trackswitch.js is a versatile web-based audio player that enables researchers to conveniently present examples and results from scientific audio processing applications. Based on a multitrack architecture, trackswitch.js allows a listener to seamlessly switch between multiple audio tracks, while synchronously indicating the playback position within images associated to the audio tracks. These images may correspond to feature representations such as spectrograms or to visualizations of annotations such as structural boundaries or musical note information. The provided switching and playback functionalities are simple yet useful tools for analyzing, navigating, understanding, and evaluating results obtained from audio processing algorithms. Furthermore, trackswitch.js is an easily extendible and manageable software tool, designed for non-expert developers and unexperienced users. Offering a small but useful selection of options and buttons, trackswitch.js requires only basic knowledge to implement a versatile range of components for web-based audio demonstrators and user interfaces. Besides introducing the underlying techniques and the main functionalities of trackswitch.js we provide several use cases that indicate the flexibility and usability of our software for different audio- related research areas

    Automatic Synchronization of Music Data in Score-, MIDI- and PCM-Format

    Get PDF
    In this paper we present algorithms for the automatic time-synchronization of score-, MIDI- or PCM-data streams which represent the same polyphonic piano piece

    Local Periodicity-Based Beat Tracking for Expressive Classical Piano Music

    Full text link
    To model the periodicity of beats, state-of-the-art beat tracking systems use "post-processing trackers" (PPTs) that rely on several empirically determined global assumptions for tempo transition, which work well for music with a steady tempo. For expressive classical music, however, these assumptions can be too rigid. With two large datasets of Western classical piano music, namely the Aligned Scores and Performances (ASAP) dataset and a dataset of Chopin's Mazurkas (Maz-5), we report on experiments showing the failure of existing PPTs to cope with local tempo changes, thus calling for new methods. In this paper, we propose a new local periodicity-based PPT, called predominant local pulse-based dynamic programming (PLPDP) tracking, that allows for more flexible tempo transitions. Specifically, the new PPT incorporates a method called "predominant local pulses" (PLP) in combination with a dynamic programming (DP) component to jointly consider the locally detected periodicity and beat activation strength at each time instant. Accordingly, PLPDP accounts for the local periodicity, rather than relying on a global tempo assumption. Compared to existing PPTs, PLPDP particularly enhances the recall values at the cost of a lower precision, resulting in an overall improvement of F1-score for beat tracking in ASAP (from 0.473 to 0.493) and Maz-5 (from 0.595 to 0.838).Comment: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (July 2023

    Virtual reality platform for sonification evaluation

    Get PDF
    Presented at the 21st International Conference on Auditory Display (ICAD2015), July 6-10, 2015, Graz, Styria, Austria.In this paper we propose a game-based virtual reality platform for evaluation of sonification techniques. We study the task of localization of stationary objects in virtual reality using auditory cues. We further explore sonification techniques and compare the performance in this task using the proposed platform. The virtual reality environment is developed using Unity3D (game engine) and an Oculus Rift, a head mounted virtual reality display. Parameter mapping sonification techniques are employed to map the position of the object in virtual space to sound. Hence, the framework defined here constitutes an auditory virtual reality environment. This auditory display interface is subjectively evaluated in stationary object localization task. A statistical analysis of the subjective and objective measures of the listening test is performed resulting in a robust and scientific evaluation of the sonification methods

    Case Study ``Beatles Songs'' — What can be Learned from Unreliable Music Alignments?

    Get PDF
    As a result of massive digitization efforts and the world wide web, there is an exploding amount of available digital data describing and representing music at various semantic levels and in diverse formats. For example, in the case of the Beatles songs, there are numerous recordings including an increasing number of cover songs and arrangements as well as MIDI data and other symbolic music representations. The general goal of music synchronization is to align the multiple information sources related to a given piece of music. This becomes a difficult problem when the various representations reveal significant differences in structure and polyphony, while exhibiting various types of artifacts. In this paper, we address the issue of how music synchronization techniques are useful for automatically revealing critical passages with significant difference between the two versions to be aligned. Using the corpus of the Beatles songs as test bed, we analyze the kind of differences occurring in audio and MIDI versions available for the song

    Wagner Ring Dataset: A Complex Opera Scenario for Music Processing and Computational Musicology

    Get PDF
    This paper introduces the Wagner Ring Dataset (WRD), a multi-modal and multi-version resource on the large-scale opera cycle Der Ring des Nibelungen by Richard Wagner. The Ring comprises four music dramas organized into eleven acts and 21 939 measures in total. Concerning sheet music, we processed a publicly available piano reduction (822 pages) of the full score with optical music recognition followed by extensive manual corrections to create a high-quality, machine-readable symbolic score. Concerning audio data, our corpus covers 16 recorded performances of the full Ring (three of which are publicly available thanks to copyright expiry), each lasting about 14–15 hours. To musically synchronize these versions among each other, we manually annotated all measure positions for three performances, which we transferred to the remaining performances via automated synchronization techniques. The dataset further comprises annotations of key and time signatures, scenes, and singing voice regions (libretto). Moreover, we provide note event annotations for all performances derived from the piano score. The WRD thus constitutes a comprehensive resource for developing algorithms for various music information retrieval tasks, complementing existing datasets with a complex opera scenario. For computational musicology, the WRD serves as a structured dataset that allows for studying the composition and performances of the Ring
    • …
    corecore